{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Copyright (c) Microsoft Corporation. All rights reserved.*\n",
    "\n",
    "*Licensed under the MIT License.*\n",
    "\n",
    "# The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "MT-DNN is an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models. Built upon PyTorch and Transformers, MT-DNN is designed to facilitate rapid\n",
    "customization for a broad spectrum of NLU tasks, using a variety of objectives (classification, regression, structured prediction) and text encoders (e.g., RNNs, BERT, RoBERTa, UniLM). A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm. To enable efficient production deployment, MT-DNN supports multitask knowledge distillation, which can substantially compress a deep neural model without significant performance drop. We demonstrate the effectiveness of MT-DNN on a wide range of NLU applications across general and biomedical domains. The pip installable package and pretrained models will be publicly available at https://github.com/microsoft/mt-dnn."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Design"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "MT-DNN is designed for modularity, flexibility, and ease of use. These modules are built upon PyTorch (Paszke et al., 2019) and Transformers (Wolf\n",
    "et al., 2019), allowing the use of the SOTA pretrained models, e.g., BERT (Devlin et al., 2019), RoBERTa (Liu et al., 2019c) and UniLM (Dong\n",
    "et al., 2019). The unique attribute of this package is a flexible interface for adversarial multi-task fine-tuning and knowledge distillation, so that researchers and developers can build large SOTA NLU models and then compress them to small ones\n",
    "for online deployment.The overall workflow and system architecture are shown in figures 1 and 3 respectively.\n",
    "\n",
    "\n",
    "![Workflow Design](https://nlpbp.blob.core.windows.net/images/mt-dnn2.JPG)\n",
    "\n",
    "The above figure shows workflow of MT-DNN: train a neural language model on a large amount of unlabeled raw text\n",
    "to obtain general contextual representations; then finetune the learned contextual representation on downstream tasks, e.g. GLUE (Wang et al., 2018); lastly, distill this large model to a lighter one for online deployment. In the later two phrases, we can leverage powerful multi-task learning and adversarial training to further improve performance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Architecture"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![overall_arch](https://nlpbp.blob.core.windows.net/images/mt-dnn.png)\n",
    "The figure above shows the overall system architecture. The lower layers are shared across all tasks while the top layers are taskspecific. The input X (either a sentence or a set of sentences) is first represented as a sequence of embedding\n",
    "vectors, one for each word, in l1. Then the encoder, e.g a Transformer or recurrent neural network (LSTM) model,\n",
    "captures the contextual information for each word and generates the shared contextual embedding vectors in l2.\n",
    "Finally, for each task, additional task-specific layers generate task-specific representations, followed by operations\n",
    "necessary for classification, similarity scoring, or relevance ranking. In case of adversarial training, we perturb\n",
    "embeddings from the lexicon encoder and then add an extra loss term during the training. Note that for the\n",
    "inference phrase, it does not require perturbations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "In this notebook, we fine-tune and evaluate MT-DNN models on a subset of the [MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) dataset.  \n",
    "\n",
    "### Running Time\n",
    "\n",
    "This is a __computationally intensive__ notebook that runs on the entire MNLI dataset for match and mismatched datasets for training, development and test.  \n",
    "\n",
    "The table below provides some reference running time on a GPU machine.  \n",
    "\n",
    "|Dataset|MULTI_GPU_ON|Machine Configurations|Running time|\n",
    "|:------|:---------|:----------------------|:------------|\n",
    "|MultiNLI|True|4 NVIDIA Tesla K80 GPUs, 24GB GPU memory| ~ 20 hours |\n",
    "\n",
    "If you run into `CUDA out-of-memory error` or the jupyter kernel dies constantly, try reducing the `BATCH_SIZE` and `MAX_SEQ_LEN` in `MTDNNConfig`, but note that model performance may be compromised.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Text Classification of MultiNLI Sentences using MT-DNN\n",
    "\n",
    "This notebook utilizes the pip installable package that implements the Multi-Task Deep Neural Network Toolkit (MTDNN) for Natural Language Understanding. It's recommended to run this notebook on GPU machines as it's very computationally intensive."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import os\n",
    "import shutil\n",
    "import sys\n",
    "from tempfile import TemporaryDirectory\n",
    "\n",
    "import pandas as pd\n",
    "import torch\n",
    "\n",
    "from mtdnn.common.types import EncoderModelType\n",
    "from mtdnn.configuration_mtdnn import MTDNNConfig\n",
    "from mtdnn.data_builder_mtdnn import MTDNNDataBuilder\n",
    "from mtdnn.modeling_mtdnn import MTDNNModel\n",
    "from mtdnn.process_mtdnn import MTDNNDataProcess\n",
    "from mtdnn.tasks.config import MTDNNTaskDefs\n",
    "from mtdnn.tokenizer_mtdnn import MTDNNTokenizer\n",
    "from utils_nlp.dataset.multinli import download_tsv_files_and_extract"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Configuration, Tasks and Model Objects"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define Configuration, Tasks and Model Objects\n",
    "ROOT_DIR = TemporaryDirectory().name\n",
    "OUTPUT_DIR = os.path.join(ROOT_DIR, 'checkpoint')\n",
    "os.makedirs(OUTPUT_DIR) if not os.path.exists(OUTPUT_DIR) else OUTPUT_DIR\n",
    "\n",
    "LOG_DIR = os.path.join(ROOT_DIR, 'tensorboard_logdir')\n",
    "os.makedirs(LOG_DIR) if not os.path.exists(LOG_DIR) else LOG_DIR\n",
    "\n",
    "DATA_DIR = os.path.join(ROOT_DIR, 'data')\n",
    "os.makedirs(DATA_DIR) if not os.path.exists(DATA_DIR) else DATA_DIR\n",
    "\n",
    "DATA_SOURCE_DIR = os.path.join(DATA_DIR, \"MNLI\")\n",
    "\n",
    "# Training parameters\n",
    "BATCH_SIZE = 16\n",
    "MULTI_GPU_ON = True\n",
    "MAX_SEQ_LEN = 128\n",
    "NUM_EPOCHS = 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exploring the location for our data to be downloaded, model to be checkpointed and logs to be dumped"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/tmp/tmpd9ok4aeo/data\n",
      "/tmp/tmpd9ok4aeo/checkpoint\n",
      "/tmp/tmpd9ok4aeo/tensorboard_logdir\n"
     ]
    }
   ],
   "source": [
    "print(DATA_DIR)\n",
    "print(OUTPUT_DIR)\n",
    "print(LOG_DIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Read Dataset\n",
    "We start by loading a subset of the data. The following function also downloads and extracts the files, if they don't exist in the data folder.\n",
    "\n",
    "The MultiNLI dataset is mainly used for natural language inference (NLI) tasks, where the inputs are sentence pairs and the labels are entailment indicators"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 305k/305k [00:05<00:00, 54.1kKB/s] \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloaded file to:  /tmp/tmpd9ok4aeo/data/MNLI\n"
     ]
    }
   ],
   "source": [
    "download_tsv_files_and_extract(DATA_DIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define a Configuration Object \n",
    "\n",
    "Create a model configuration object, `MTDNNConfig`, with the necessary parameters to initialize the MT-DNN model. Initialization without any parameters will default to a similar configuration that initializes a BERT model. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "config = MTDNNConfig(batch_size=BATCH_SIZE, \n",
    "                     max_seq_len=MAX_SEQ_LEN, \n",
    "                     multi_gpu_on=MULTI_GPU_ON)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Create Task Definition Object  \n",
    "\n",
    "Define the task parameters to train for and initialize an `MTDNNTaskDefs` object. Create a task parameter dictionary. Definition can be a single or multiple tasks to train.  `MTDNNTaskDefs` can take a python dict, yaml or json file with task(s) defintion.\n",
    "\n",
    "The data source directory is the path of data downloaded and extracted above using `download_tsv_files_and_extract` which is the `MNLI` dir under the `DATA_DIR` temporary directory.    \n",
    "\n",
    "The data source has options that are set to drive each task pre-processing; `data_process_opts`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "06/26/2020 08:07:58 - mtdnn.tasks.config - INFO - Mapping Task attributes\n",
      "06/26/2020 08:07:58 - mtdnn.tasks.config - INFO - Configured task definitions - ['mnli']\n"
     ]
    }
   ],
   "source": [
    "tasks_params = {\n",
    "    \"mnli\": {\n",
    "        \"data_format\": \"PremiseAndOneHypothesis\",\n",
    "        \"encoder_type\": \"BERT\",\n",
    "        \"dropout_p\": 0.3,\n",
    "        \"enable_san\": True,\n",
    "        \"labels\": [\"contradiction\", \"neutral\", \"entailment\"],\n",
    "        \"metric_meta\": [\"ACC\"],\n",
    "        \"loss\": \"CeCriterion\",\n",
    "        \"kd_loss\": \"MseCriterion\",\n",
    "        \"n_class\": 3,\n",
    "        \"split_names\": [\n",
    "            \"train\",\n",
    "            \"dev_matched\",\n",
    "            \"dev_mismatched\",\n",
    "            \"test_matched\",\n",
    "            \"test_mismatched\",\n",
    "        ],\n",
    "        \"data_source_dir\": DATA_SOURCE_DIR,\n",
    "        \"data_process_opts\": {\"header\": True, \"is_train\": True, \"multi_snli\": False,},\n",
    "        \"task_type\": \"Classification\",\n",
    "    },\n",
    "}\n",
    "\n",
    "# Define the tasks\n",
    "task_defs = MTDNNTaskDefs(tasks_params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Create the MTDNN Data Tokenizer Object  \n",
    "\n",
    "Create a data tokenizing object, `MTDNNTokenizer`. Based on the model initial checkpoint, it wraps around the model's Huggingface transformers library to encode the data to MT-DNN format. This becomes the input to the data building stage.  \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "tokenizer = MTDNNTokenizer(do_lower_case=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Testing out the Tokenizer encode function on a sample text\n",
    "`tokenizer.encode(\"What NLP toolkit do you recommend\", \"MT-DNN is a fantastic toolkit\")`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "([101, 2054, 17953, 2361, 6994, 23615, 2079, 2017, 16755, 102, 11047, 1011, 1040, 10695, 2003, 1037, 10392, 6994, 23615, 102], None, [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])\n"
     ]
    }
   ],
   "source": [
    "print(tokenizer.encode(\"What NLP toolkit do you recommend\", \"MT-DNN is a fantastic toolkit\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Preprocessing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the Data Builder Object  \n",
    "\n",
    "Create a data preprocessing object, `MTDNNDataBuilder`. This class is responsible for converting the data into the MT-DNN format depending on the task.  \n",
    " \n",
    "\n",
    "Define a data builder that handles the creating of each task's vectorized data utilizing the model tokenizer. This will build out the vectorized data needed for creating the training, test and development PyTorch dataloaders"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "06/26/2020 08:08:01 - mtdnn.data_builder_mtdnn - INFO - Sucessfully loaded and built 392702 samples for mnli at /tmp/tmpd9ok4aeo/data/canonical_data/mnli_train.tsv\n",
      "06/26/2020 08:08:01 - mtdnn.data_builder_mtdnn - INFO - Sucessfully loaded and built 9815 samples for mnli at /tmp/tmpd9ok4aeo/data/canonical_data/mnli_dev_matched.tsv\n",
      "06/26/2020 08:08:01 - mtdnn.data_builder_mtdnn - INFO - Sucessfully loaded and built 9832 samples for mnli at /tmp/tmpd9ok4aeo/data/canonical_data/mnli_dev_mismatched.tsv\n",
      "06/26/2020 08:08:01 - mtdnn.data_builder_mtdnn - INFO - Sucessfully loaded and built 9796 samples for mnli at /tmp/tmpd9ok4aeo/data/canonical_data/mnli_test_matched.tsv\n",
      "06/26/2020 08:08:01 - mtdnn.data_builder_mtdnn - INFO - Sucessfully loaded and built 9847 samples for mnli at /tmp/tmpd9ok4aeo/data/canonical_data/mnli_test_mismatched.tsv\n",
      "mnli_train\n",
      "06/26/2020 08:08:01 - mtdnn.data_builder_mtdnn - INFO - Building Data For 'MNLI TRAIN' Task\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Building Data For Premise and One Hypothesis: 392702it [05:19, 1228.85it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "06/26/2020 08:13:22 - mtdnn.data_builder_mtdnn - INFO - Saving data to /tmp/tmpd9ok4aeo/data/canonical_data/bert_base_uncased/mnli_train.json\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "Saving Data For PremiseAndOneHypothesis: 100%|██████████| 392702/392702 [00:05<00:00, 70762.79it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mnli_dev_matched\n",
      "06/26/2020 08:13:28 - mtdnn.data_builder_mtdnn - INFO - Building Data For 'MNLI DEV MATCHED' Task\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "Building Data For Premise and One Hypothesis: 9815it [00:09, 1017.29it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "06/26/2020 08:13:38 - mtdnn.data_builder_mtdnn - INFO - Saving data to /tmp/tmpd9ok4aeo/data/canonical_data/bert_base_uncased/mnli_dev_matched.json\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "Saving Data For PremiseAndOneHypothesis: 100%|██████████| 9815/9815 [00:00<00:00, 66741.94it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mnli_dev_mismatched\n",
      "06/26/2020 08:13:38 - mtdnn.data_builder_mtdnn - INFO - Building Data For 'MNLI DEV MISMATCHED' Task\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "Building Data For Premise and One Hypothesis: 9832it [00:08, 1207.60it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "06/26/2020 08:13:46 - mtdnn.data_builder_mtdnn - INFO - Saving data to /tmp/tmpd9ok4aeo/data/canonical_data/bert_base_uncased/mnli_dev_mismatched.json\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "Saving Data For PremiseAndOneHypothesis: 100%|██████████| 9832/9832 [00:00<00:00, 72382.99it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mnli_test_matched\n",
      "06/26/2020 08:13:46 - mtdnn.data_builder_mtdnn - INFO - Building Data For 'MNLI TEST MATCHED' Task\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "Building Data For Premise and One Hypothesis: 9796it [00:07, 1243.12it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "06/26/2020 08:13:54 - mtdnn.data_builder_mtdnn - INFO - Saving data to /tmp/tmpd9ok4aeo/data/canonical_data/bert_base_uncased/mnli_test_matched.json\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "Saving Data For PremiseAndOneHypothesis: 100%|██████████| 9796/9796 [00:00<00:00, 73680.61it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mnli_test_mismatched\n",
      "06/26/2020 08:13:54 - mtdnn.data_builder_mtdnn - INFO - Building Data For 'MNLI TEST MISMATCHED' Task\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "Building Data For Premise and One Hypothesis: 9847it [00:08, 1195.65it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "06/26/2020 08:14:02 - mtdnn.data_builder_mtdnn - INFO - Saving data to /tmp/tmpd9ok4aeo/data/canonical_data/bert_base_uncased/mnli_test_mismatched.json\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "Saving Data For PremiseAndOneHypothesis: 100%|██████████| 9847/9847 [00:00<00:00, 67993.64it/s]\n"
     ]
    }
   ],
   "source": [
    "## Load and build data\n",
    "data_builder = MTDNNDataBuilder(\n",
    "    tokenizer=tokenizer,\n",
    "    task_defs=task_defs,\n",
    "    data_dir=DATA_DIR,\n",
    "    canonical_data_suffix=\"canonical_data\",\n",
    "    dump_rows=True,\n",
    ")\n",
    "\n",
    "## Build data to MTDNN Format\n",
    "## Iterable of each specific task and processed data\n",
    "vectorized_data = data_builder.vectorize()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the Data Processing Object  \n",
    "\n",
    "Create a data preprocessing object, `MTDNNDataProcess`. This creates the training, test and development PyTorch dataloaders needed for training and testing. We also need to retrieve the necessary training options required to initialize the model correctly, for all tasks.  \n",
    "\n",
    "Define a data process that handles creating the training, test and development PyTorch dataloaders"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "06/26/2020 08:14:03 - mtdnn.process_mtdnn - INFO - Starting to process the training data sets\n",
      "06/26/2020 08:14:03 - mtdnn.process_mtdnn - INFO - Loading mnli_train as task 0\n",
      "06/26/2020 08:14:03 - mtdnn.dataset_mtdnn - INFO - Loaded 391533 samples out of 392702\n",
      "06/26/2020 08:14:03 - mtdnn.process_mtdnn - INFO - Starting to process the testing data sets\n",
      "06/26/2020 08:14:03 - mtdnn.process_mtdnn - INFO - Loading mnli_dev_matched as task 0\n",
      "06/26/2020 08:14:03 - mtdnn.dataset_mtdnn - INFO - Loaded 9815 samples out of 9815\n",
      "06/26/2020 08:14:03 - mtdnn.process_mtdnn - INFO - Loading mnli_dev_mismatched as task 0\n",
      "06/26/2020 08:14:03 - mtdnn.dataset_mtdnn - INFO - Loaded 9832 samples out of 9832\n",
      "06/26/2020 08:14:03 - mtdnn.process_mtdnn - INFO - Loading mnli_test_matched as task 0\n",
      "06/26/2020 08:14:03 - mtdnn.dataset_mtdnn - INFO - Loaded 9796 samples out of 9796\n",
      "06/26/2020 08:14:03 - mtdnn.process_mtdnn - INFO - Loading mnli_test_mismatched as task 0\n",
      "06/26/2020 08:14:03 - mtdnn.dataset_mtdnn - INFO - Loaded 9847 samples out of 9847\n"
     ]
    }
   ],
   "source": [
    "# Make the Data Preprocess step and update the config with training data updates\n",
    "data_processor = MTDNNDataProcess(\n",
    "    config=config, task_defs=task_defs, vectorized_data=vectorized_data\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Retrieve the processed batch multitask batch data loaders for training, development and test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "multitask_train_dataloader = data_processor.get_train_dataloader()\n",
    "dev_dataloaders_list = data_processor.get_dev_dataloaders()\n",
    "test_dataloaders_list = data_processor.get_test_dataloaders()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can retrieve the training options, from the processor, to initialize model with."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "decoder_opts = data_processor.get_decoder_options_list()\n",
    "task_types = data_processor.get_task_types_list()\n",
    "dropout_list = data_processor.get_tasks_dropout_prob_list()\n",
    "loss_types = data_processor.get_loss_types_list()\n",
    "kd_loss_types = data_processor.get_kd_loss_types_list()\n",
    "tasks_nclass_list = data_processor.get_task_nclass_list()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us update the batch steps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_all_batches = data_processor.get_num_all_batches()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Instantiate the MTDNN Model\n",
    "\n",
    "Now we can go ahead and create an `MTDNNModel` model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "idx: 0, number of task labels: 3\n"
     ]
    }
   ],
   "source": [
    "model = MTDNNModel(\n",
    "    config,\n",
    "    task_defs,\n",
    "    pretrained_model_name=\"bert-base-uncased\",\n",
    "    num_train_step=num_all_batches,\n",
    "    decoder_opts=decoder_opts,\n",
    "    task_types=task_types,\n",
    "    dropout_list=dropout_list,\n",
    "    loss_types=loss_types,\n",
    "    kd_loss_types=kd_loss_types,\n",
    "    tasks_nclass_list=tasks_nclass_list,\n",
    "    multitask_train_dataloader=multitask_train_dataloader,\n",
    "    dev_dataloaders_list=dev_dataloaders_list,\n",
    "    test_dataloaders_list=test_dataloaders_list,\n",
    "    output_dir=OUTPUT_DIR,\n",
    "    log_dir=LOG_DIR \n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Finetuning, Prediction and Evaluation\n",
    "\n",
    "### Fit and finetune model on five epochs and predict using the training and test  \n",
    "\n",
    "At this point the MT-DNN model allows us to fit to the model and create predictions. The fit takes an optional `epochs` parameter that overwrites the epochs set in the `MTDNNConfig` object. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "06/26/2020 08:14:07 - mtdnn.modeling_mtdnn - INFO - Total number of params: 109484547\n",
      "06/26/2020 08:14:07 - mtdnn.modeling_mtdnn - INFO - At epoch 0\n",
      "06/26/2020 08:14:07 - mtdnn.modeling_mtdnn - INFO - Amount of data to go over: 24471\n",
      "06/26/2020 08:14:13 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [     1] Training Loss - [1.63923] Time Remaining - [1 day, 13:33:33]\n",
      "06/26/2020 08:19:40 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [   500] Training Loss - [1.32204] Time Remaining - [4:25:55]\n",
      "06/26/2020 08:25:09 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  1000] Training Loss - [1.21343] Time Remaining - [4:18:55]\n",
      "06/26/2020 08:30:42 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  1500] Training Loss - [1.16369] Time Remaining - [4:13:52]\n",
      "06/26/2020 08:36:15 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  2000] Training Loss - [1.12522] Time Remaining - [4:08:36]\n",
      "06/26/2020 08:41:48 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  2500] Training Loss - [1.07541] Time Remaining - [4:03:18]\n",
      "06/26/2020 08:47:23 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  3000] Training Loss - [1.03195] Time Remaining - [3:58:03]\n",
      "06/26/2020 08:52:55 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  3500] Training Loss - [0.99050] Time Remaining - [3:52:26]\n",
      "06/26/2020 08:58:28 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  4000] Training Loss - [0.95599] Time Remaining - [3:46:59]\n",
      "06/26/2020 09:04:03 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  4500] Training Loss - [0.92721] Time Remaining - [3:41:34]\n",
      "06/26/2020 09:09:37 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  5000] Training Loss - [0.90235] Time Remaining - [3:36:07]\n",
      "06/26/2020 09:15:10 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  5500] Training Loss - [0.87961] Time Remaining - [3:30:33]\n",
      "06/26/2020 09:20:41 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  6000] Training Loss - [0.85982] Time Remaining - [3:24:55]\n",
      "06/26/2020 09:26:15 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  6500] Training Loss - [0.84107] Time Remaining - [3:19:24]\n",
      "06/26/2020 09:31:47 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  7000] Training Loss - [0.82505] Time Remaining - [3:13:51]\n",
      "06/26/2020 09:37:23 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  7500] Training Loss - [0.81009] Time Remaining - [3:08:25]\n",
      "06/26/2020 09:42:55 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  8000] Training Loss - [0.79706] Time Remaining - [3:02:49]\n",
      "06/26/2020 09:48:26 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  8500] Training Loss - [0.78522] Time Remaining - [2:57:13]\n",
      "06/26/2020 09:54:00 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  9000] Training Loss - [0.77296] Time Remaining - [2:51:42]\n",
      "06/26/2020 09:59:34 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [  9500] Training Loss - [0.76185] Time Remaining - [2:46:11]\n",
      "06/26/2020 10:05:11 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 10000] Training Loss - [0.75168] Time Remaining - [2:40:42]\n",
      "06/26/2020 10:10:46 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 10500] Training Loss - [0.74186] Time Remaining - [2:35:11]\n",
      "06/26/2020 10:16:17 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 11000] Training Loss - [0.73347] Time Remaining - [2:29:37]\n",
      "06/26/2020 10:21:50 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 11500] Training Loss - [0.72535] Time Remaining - [2:24:03]\n",
      "06/26/2020 10:27:24 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 12000] Training Loss - [0.71798] Time Remaining - [2:18:30]\n",
      "06/26/2020 10:32:56 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 12500] Training Loss - [0.71132] Time Remaining - [2:12:56]\n",
      "06/26/2020 10:38:30 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 13000] Training Loss - [0.70462] Time Remaining - [2:07:23]\n",
      "06/26/2020 10:44:02 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 13500] Training Loss - [0.69882] Time Remaining - [2:01:49]\n",
      "06/26/2020 10:49:35 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 14000] Training Loss - [0.69229] Time Remaining - [1:56:16]\n",
      "06/26/2020 10:55:08 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 14500] Training Loss - [0.68647] Time Remaining - [1:50:43]\n",
      "06/26/2020 11:00:42 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 15000] Training Loss - [0.68061] Time Remaining - [1:45:10]\n",
      "06/26/2020 11:06:18 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 15500] Training Loss - [0.67555] Time Remaining - [1:39:39]\n",
      "06/26/2020 11:11:51 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 16000] Training Loss - [0.67038] Time Remaining - [1:34:05]\n",
      "06/26/2020 11:17:23 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 16500] Training Loss - [0.66557] Time Remaining - [1:28:32]\n",
      "06/26/2020 11:22:54 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 17000] Training Loss - [0.66106] Time Remaining - [1:22:57]\n",
      "06/26/2020 11:28:26 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 17500] Training Loss - [0.65651] Time Remaining - [1:17:24]\n",
      "06/26/2020 11:34:01 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 18000] Training Loss - [0.65221] Time Remaining - [1:11:51]\n",
      "06/26/2020 11:39:32 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 18500] Training Loss - [0.64808] Time Remaining - [1:06:17]\n",
      "06/26/2020 11:45:03 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 19000] Training Loss - [0.64444] Time Remaining - [1:00:44]\n",
      "06/26/2020 11:50:37 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 19500] Training Loss - [0.64039] Time Remaining - [0:55:11]\n",
      "06/26/2020 11:56:10 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 20000] Training Loss - [0.63708] Time Remaining - [0:49:38]\n",
      "06/27/2020 12:01:45 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 20500] Training Loss - [0.63337] Time Remaining - [0:44:05]\n",
      "06/27/2020 12:07:19 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 21000] Training Loss - [0.62972] Time Remaining - [0:38:32]\n",
      "06/27/2020 12:12:53 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 21500] Training Loss - [0.62656] Time Remaining - [0:32:59]\n",
      "06/27/2020 12:18:27 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 22000] Training Loss - [0.62311] Time Remaining - [0:27:26]\n",
      "06/27/2020 12:23:59 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 22500] Training Loss - [0.62002] Time Remaining - [0:21:53]\n",
      "06/27/2020 12:29:32 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 23000] Training Loss - [0.61681] Time Remaining - [0:16:20]\n",
      "06/27/2020 12:35:04 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 23500] Training Loss - [0.61411] Time Remaining - [0:10:46]\n",
      "06/27/2020 12:40:36 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 24000] Training Loss - [0.61127] Time Remaining - [0:05:13]\n",
      "06/27/2020 12:45:48 - mtdnn.modeling_mtdnn - INFO - Saving mt-dnn model to /tmp/tmpd9ok4aeo/checkpoint/model_0.pt\n",
      "06/27/2020 12:45:50 - mtdnn.modeling_mtdnn - INFO - model saved to /tmp/tmpd9ok4aeo/checkpoint/model_0.pt\n",
      "06/27/2020 12:45:50 - mtdnn.modeling_mtdnn - INFO - At epoch 1\n",
      "06/27/2020 12:45:50 - mtdnn.modeling_mtdnn - INFO - Amount of data to go over: 24471\n",
      "06/27/2020 12:46:09 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 24500] Training Loss - [0.60860] Time Remaining - [4:31:07]\n",
      "06/27/2020 12:51:44 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 25000] Training Loss - [0.60618] Time Remaining - [4:27:29]\n",
      "06/27/2020 12:57:16 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 25500] Training Loss - [0.60383] Time Remaining - [4:20:36]\n",
      "06/27/2020 01:02:50 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 26000] Training Loss - [0.60122] Time Remaining - [4:15:02]\n",
      "06/27/2020 01:08:22 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 26500] Training Loss - [0.59883] Time Remaining - [4:09:14]\n",
      "06/27/2020 01:13:54 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 27000] Training Loss - [0.59667] Time Remaining - [4:03:36]\n",
      "06/27/2020 01:19:27 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 27500] Training Loss - [0.59434] Time Remaining - [3:58:02]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "06/27/2020 01:25:00 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 28000] Training Loss - [0.59204] Time Remaining - [3:52:28]\n",
      "06/27/2020 01:30:34 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 28500] Training Loss - [0.58952] Time Remaining - [3:46:57]\n",
      "06/27/2020 01:36:07 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 29000] Training Loss - [0.58707] Time Remaining - [3:41:27]\n",
      "06/27/2020 01:41:39 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 29500] Training Loss - [0.58480] Time Remaining - [3:35:47]\n",
      "06/27/2020 01:47:11 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 30000] Training Loss - [0.58238] Time Remaining - [3:30:10]\n",
      "06/27/2020 01:52:43 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 30500] Training Loss - [0.57984] Time Remaining - [3:24:34]\n",
      "06/27/2020 01:58:16 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 31000] Training Loss - [0.57737] Time Remaining - [3:19:04]\n",
      "06/27/2020 02:03:47 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 31500] Training Loss - [0.57507] Time Remaining - [3:13:25]\n",
      "06/27/2020 02:09:21 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 32000] Training Loss - [0.57277] Time Remaining - [3:07:56]\n",
      "06/27/2020 02:14:52 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 32500] Training Loss - [0.57034] Time Remaining - [3:02:20]\n",
      "06/27/2020 02:20:22 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 33000] Training Loss - [0.56793] Time Remaining - [2:56:42]\n",
      "06/27/2020 02:25:56 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 33500] Training Loss - [0.56548] Time Remaining - [2:51:11]\n",
      "06/27/2020 02:31:30 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 34000] Training Loss - [0.56309] Time Remaining - [2:45:41]\n",
      "06/27/2020 02:37:04 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 34500] Training Loss - [0.56059] Time Remaining - [2:40:11]\n",
      "06/27/2020 02:42:39 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 35000] Training Loss - [0.55799] Time Remaining - [2:34:41]\n",
      "06/27/2020 02:48:11 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 35500] Training Loss - [0.55566] Time Remaining - [2:29:07]\n",
      "06/27/2020 02:53:44 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 36000] Training Loss - [0.55331] Time Remaining - [2:23:34]\n",
      "06/27/2020 02:59:18 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 36500] Training Loss - [0.55091] Time Remaining - [2:18:02]\n",
      "06/27/2020 03:04:50 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 37000] Training Loss - [0.54856] Time Remaining - [2:12:29]\n",
      "06/27/2020 03:10:21 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 37500] Training Loss - [0.54628] Time Remaining - [2:06:55]\n",
      "06/27/2020 03:15:53 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 38000] Training Loss - [0.54413] Time Remaining - [2:01:21]\n",
      "06/27/2020 03:21:26 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 38500] Training Loss - [0.54178] Time Remaining - [1:55:49]\n",
      "06/27/2020 03:27:00 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 39000] Training Loss - [0.53955] Time Remaining - [1:50:16]\n",
      "06/27/2020 03:32:30 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 39500] Training Loss - [0.53732] Time Remaining - [1:44:42]\n",
      "06/27/2020 03:38:05 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 40000] Training Loss - [0.53530] Time Remaining - [1:39:11]\n",
      "06/27/2020 03:43:37 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 40500] Training Loss - [0.53318] Time Remaining - [1:33:38]\n",
      "06/27/2020 03:49:10 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 41000] Training Loss - [0.53105] Time Remaining - [1:28:05]\n",
      "06/27/2020 03:54:41 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 41500] Training Loss - [0.52908] Time Remaining - [1:22:32]\n",
      "06/27/2020 04:00:14 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 42000] Training Loss - [0.52711] Time Remaining - [1:16:59]\n",
      "06/27/2020 04:05:48 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 42500] Training Loss - [0.52516] Time Remaining - [1:11:26]\n",
      "06/27/2020 04:11:18 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 43000] Training Loss - [0.52324] Time Remaining - [1:05:53]\n",
      "06/27/2020 04:16:50 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 43500] Training Loss - [0.52161] Time Remaining - [1:00:20]\n",
      "06/27/2020 04:22:23 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 44000] Training Loss - [0.51970] Time Remaining - [0:54:48]\n",
      "06/27/2020 04:27:54 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 44500] Training Loss - [0.51821] Time Remaining - [0:49:14]\n",
      "06/27/2020 04:33:27 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 45000] Training Loss - [0.51635] Time Remaining - [0:43:42]\n",
      "06/27/2020 04:39:00 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 45500] Training Loss - [0.51451] Time Remaining - [0:38:09]\n",
      "06/27/2020 04:44:33 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 46000] Training Loss - [0.51286] Time Remaining - [0:32:37]\n",
      "06/27/2020 04:50:05 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 46500] Training Loss - [0.51112] Time Remaining - [0:27:04]\n",
      "06/27/2020 04:55:37 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 47000] Training Loss - [0.50952] Time Remaining - [0:21:31]\n",
      "06/27/2020 05:01:08 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 47500] Training Loss - [0.50789] Time Remaining - [0:15:59]\n",
      "06/27/2020 05:06:39 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 48000] Training Loss - [0.50631] Time Remaining - [0:10:26]\n",
      "06/27/2020 05:12:12 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 48500] Training Loss - [0.50469] Time Remaining - [0:04:53]\n",
      "06/27/2020 05:17:06 - mtdnn.modeling_mtdnn - INFO - Saving mt-dnn model to /tmp/tmpd9ok4aeo/checkpoint/model_1.pt\n",
      "06/27/2020 05:17:07 - mtdnn.modeling_mtdnn - INFO - model saved to /tmp/tmpd9ok4aeo/checkpoint/model_1.pt\n",
      "06/27/2020 05:17:07 - mtdnn.modeling_mtdnn - INFO - At epoch 2\n",
      "06/27/2020 05:17:07 - mtdnn.modeling_mtdnn - INFO - Amount of data to go over: 24471\n",
      "06/27/2020 05:17:46 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 49000] Training Loss - [0.50317] Time Remaining - [4:33:15]\n",
      "06/27/2020 05:23:21 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 49500] Training Loss - [0.50171] Time Remaining - [4:26:45]\n",
      "06/27/2020 05:28:53 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 50000] Training Loss - [0.50034] Time Remaining - [4:20:18]\n",
      "06/27/2020 05:34:26 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 50500] Training Loss - [0.49876] Time Remaining - [4:14:39]\n",
      "06/27/2020 05:39:58 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 51000] Training Loss - [0.49731] Time Remaining - [4:08:48]\n",
      "06/27/2020 05:45:32 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 51500] Training Loss - [0.49601] Time Remaining - [4:03:18]\n",
      "06/27/2020 05:51:06 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 52000] Training Loss - [0.49468] Time Remaining - [3:57:54]\n",
      "06/27/2020 05:56:40 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 52500] Training Loss - [0.49328] Time Remaining - [3:52:27]\n",
      "06/27/2020 06:02:14 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 53000] Training Loss - [0.49179] Time Remaining - [3:46:53]\n",
      "06/27/2020 06:07:48 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 53500] Training Loss - [0.49036] Time Remaining - [3:41:24]\n",
      "06/27/2020 06:13:21 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 54000] Training Loss - [0.48902] Time Remaining - [3:35:48]\n",
      "06/27/2020 06:18:53 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 54500] Training Loss - [0.48761] Time Remaining - [3:30:10]\n",
      "06/27/2020 06:24:25 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 55000] Training Loss - [0.48609] Time Remaining - [3:24:32]\n",
      "06/27/2020 06:29:59 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 55500] Training Loss - [0.48458] Time Remaining - [3:19:02]\n",
      "06/27/2020 06:35:32 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 56000] Training Loss - [0.48321] Time Remaining - [3:13:27]\n",
      "06/27/2020 06:41:07 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 56500] Training Loss - [0.48176] Time Remaining - [3:07:58]\n",
      "06/27/2020 06:46:38 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 57000] Training Loss - [0.48029] Time Remaining - [3:02:19]\n",
      "06/27/2020 06:52:09 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 57500] Training Loss - [0.47890] Time Remaining - [2:56:41]\n",
      "06/27/2020 06:57:44 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 58000] Training Loss - [0.47741] Time Remaining - [2:51:11]\n",
      "06/27/2020 07:03:17 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 58500] Training Loss - [0.47591] Time Remaining - [2:45:38]\n",
      "06/27/2020 07:08:51 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 59000] Training Loss - [0.47436] Time Remaining - [2:40:06]\n",
      "06/27/2020 07:14:26 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 59500] Training Loss - [0.47282] Time Remaining - [2:34:34]\n",
      "06/27/2020 07:19:59 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 60000] Training Loss - [0.47137] Time Remaining - [2:29:01]\n",
      "06/27/2020 07:25:32 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 60500] Training Loss - [0.46989] Time Remaining - [2:23:27]\n",
      "06/27/2020 07:31:05 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 61000] Training Loss - [0.46844] Time Remaining - [2:17:54]\n",
      "06/27/2020 07:36:38 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 61500] Training Loss - [0.46691] Time Remaining - [2:12:20]\n",
      "06/27/2020 07:42:12 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 62000] Training Loss - [0.46547] Time Remaining - [2:06:48]\n",
      "06/27/2020 07:47:45 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 62500] Training Loss - [0.46406] Time Remaining - [2:01:14]\n",
      "06/27/2020 07:53:17 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 63000] Training Loss - [0.46261] Time Remaining - [1:55:40]\n",
      "06/27/2020 07:58:50 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 63500] Training Loss - [0.46117] Time Remaining - [1:50:06]\n",
      "06/27/2020 08:04:23 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 64000] Training Loss - [0.45977] Time Remaining - [1:44:33]\n",
      "06/27/2020 08:10:00 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 64500] Training Loss - [0.45842] Time Remaining - [1:39:02]\n",
      "06/27/2020 08:15:32 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 65000] Training Loss - [0.45711] Time Remaining - [1:33:28]\n",
      "06/27/2020 08:21:05 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 65500] Training Loss - [0.45574] Time Remaining - [1:27:54]\n",
      "06/27/2020 08:26:37 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 66000] Training Loss - [0.45438] Time Remaining - [1:22:20]\n",
      "06/27/2020 08:32:09 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 66500] Training Loss - [0.45316] Time Remaining - [1:16:47]\n",
      "06/27/2020 08:37:44 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 67000] Training Loss - [0.45187] Time Remaining - [1:11:14]\n",
      "06/27/2020 08:43:13 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 67500] Training Loss - [0.45054] Time Remaining - [1:05:40]\n",
      "06/27/2020 08:48:47 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 68000] Training Loss - [0.44935] Time Remaining - [1:00:07]\n",
      "06/27/2020 08:54:20 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 68500] Training Loss - [0.44811] Time Remaining - [0:54:33]\n",
      "06/27/2020 08:59:52 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 69000] Training Loss - [0.44706] Time Remaining - [0:49:00]\n",
      "06/27/2020 09:05:25 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 69500] Training Loss - [0.44582] Time Remaining - [0:43:27]\n",
      "06/27/2020 09:10:59 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 70000] Training Loss - [0.44460] Time Remaining - [0:37:54]\n",
      "06/27/2020 09:16:32 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 70500] Training Loss - [0.44340] Time Remaining - [0:32:21]\n",
      "06/27/2020 09:22:04 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 71000] Training Loss - [0.44224] Time Remaining - [0:26:47]\n",
      "06/27/2020 09:27:36 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 71500] Training Loss - [0.44109] Time Remaining - [0:21:14]\n",
      "06/27/2020 09:33:10 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 72000] Training Loss - [0.43992] Time Remaining - [0:15:41]\n",
      "06/27/2020 09:38:43 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 72500] Training Loss - [0.43884] Time Remaining - [0:10:08]\n",
      "06/27/2020 09:44:16 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 73000] Training Loss - [0.43772] Time Remaining - [0:04:35]\n",
      "06/27/2020 09:48:52 - mtdnn.modeling_mtdnn - INFO - Saving mt-dnn model to /tmp/tmpd9ok4aeo/checkpoint/model_2.pt\n",
      "06/27/2020 09:48:53 - mtdnn.modeling_mtdnn - INFO - model saved to /tmp/tmpd9ok4aeo/checkpoint/model_2.pt\n",
      "06/27/2020 09:48:53 - mtdnn.modeling_mtdnn - INFO - At epoch 3\n",
      "06/27/2020 09:48:53 - mtdnn.modeling_mtdnn - INFO - Amount of data to go over: 24471\n",
      "06/27/2020 09:49:51 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 73500] Training Loss - [0.43667] Time Remaining - [4:31:14]\n",
      "06/27/2020 09:55:24 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 74000] Training Loss - [0.43569] Time Remaining - [4:24:58]\n",
      "06/27/2020 10:00:54 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 74500] Training Loss - [0.43456] Time Remaining - [4:18:35]\n",
      "06/27/2020 10:06:28 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 75000] Training Loss - [0.43348] Time Remaining - [4:13:25]\n",
      "06/27/2020 10:12:00 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 75500] Training Loss - [0.43240] Time Remaining - [4:07:56]\n",
      "06/27/2020 10:17:31 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 76000] Training Loss - [0.43145] Time Remaining - [4:02:12]\n",
      "06/27/2020 10:23:03 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 76500] Training Loss - [0.43042] Time Remaining - [3:56:40]\n",
      "06/27/2020 10:28:36 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 77000] Training Loss - [0.42942] Time Remaining - [3:51:12]\n",
      "06/27/2020 10:34:09 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 77500] Training Loss - [0.42829] Time Remaining - [3:45:45]\n",
      "06/27/2020 10:39:44 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 78000] Training Loss - [0.42727] Time Remaining - [3:40:23]\n",
      "06/27/2020 10:45:14 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 78500] Training Loss - [0.42634] Time Remaining - [3:34:42]\n",
      "06/27/2020 10:50:46 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 79000] Training Loss - [0.42530] Time Remaining - [3:29:08]\n",
      "06/27/2020 10:56:18 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 79500] Training Loss - [0.42421] Time Remaining - [3:23:36]\n",
      "06/27/2020 11:01:51 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 80000] Training Loss - [0.42316] Time Remaining - [3:18:07]\n",
      "06/27/2020 11:07:23 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 80500] Training Loss - [0.42214] Time Remaining - [3:12:31]\n",
      "06/27/2020 11:12:56 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 81000] Training Loss - [0.42110] Time Remaining - [3:07:02]\n",
      "06/27/2020 11:18:28 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 81500] Training Loss - [0.42001] Time Remaining - [3:01:29]\n",
      "06/27/2020 11:23:59 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 82000] Training Loss - [0.41902] Time Remaining - [2:55:54]\n",
      "06/27/2020 11:29:30 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 82500] Training Loss - [0.41800] Time Remaining - [2:50:20]\n",
      "06/27/2020 11:35:03 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 83000] Training Loss - [0.41688] Time Remaining - [2:44:49]\n",
      "06/27/2020 11:40:36 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 83500] Training Loss - [0.41583] Time Remaining - [2:39:18]\n",
      "06/27/2020 11:46:08 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 84000] Training Loss - [0.41472] Time Remaining - [2:33:45]\n",
      "06/27/2020 11:51:38 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 84500] Training Loss - [0.41364] Time Remaining - [2:28:10]\n",
      "06/27/2020 11:57:10 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 85000] Training Loss - [0.41259] Time Remaining - [2:22:38]\n",
      "06/27/2020 12:02:42 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 85500] Training Loss - [0.41152] Time Remaining - [2:17:06]\n",
      "06/27/2020 12:08:14 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 86000] Training Loss - [0.41049] Time Remaining - [2:11:34]\n",
      "06/27/2020 12:13:44 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 86500] Training Loss - [0.40944] Time Remaining - [2:05:59]\n",
      "06/27/2020 12:19:16 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 87000] Training Loss - [0.40839] Time Remaining - [2:00:27]\n",
      "06/27/2020 12:24:48 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 87500] Training Loss - [0.40739] Time Remaining - [1:54:55]\n",
      "06/27/2020 12:30:21 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 88000] Training Loss - [0.40638] Time Remaining - [1:49:24]\n",
      "06/27/2020 12:35:52 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 88500] Training Loss - [0.40539] Time Remaining - [1:43:51]\n",
      "06/27/2020 12:41:27 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 89000] Training Loss - [0.40443] Time Remaining - [1:38:21]\n",
      "06/27/2020 12:47:00 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 89500] Training Loss - [0.40348] Time Remaining - [1:32:49]\n",
      "06/27/2020 12:52:33 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 90000] Training Loss - [0.40242] Time Remaining - [1:27:17]\n",
      "06/27/2020 12:58:02 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 90500] Training Loss - [0.40147] Time Remaining - [1:21:44]\n",
      "06/27/2020 01:03:34 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 91000] Training Loss - [0.40057] Time Remaining - [1:16:12]\n",
      "06/27/2020 01:09:05 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 91500] Training Loss - [0.39961] Time Remaining - [1:10:39]\n",
      "06/27/2020 01:14:35 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 92000] Training Loss - [0.39874] Time Remaining - [1:05:07]\n",
      "06/27/2020 01:20:06 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 92500] Training Loss - [0.39783] Time Remaining - [0:59:34]\n",
      "06/27/2020 01:25:37 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 93000] Training Loss - [0.39692] Time Remaining - [0:54:02]\n",
      "06/27/2020 01:31:09 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 93500] Training Loss - [0.39617] Time Remaining - [0:48:30]\n",
      "06/27/2020 01:36:42 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 94000] Training Loss - [0.39524] Time Remaining - [0:42:58]\n",
      "06/27/2020 01:42:15 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 94500] Training Loss - [0.39436] Time Remaining - [0:37:26]\n",
      "06/27/2020 01:47:46 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 95000] Training Loss - [0.39353] Time Remaining - [0:31:54]\n",
      "06/27/2020 01:53:18 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 95500] Training Loss - [0.39261] Time Remaining - [0:26:22]\n",
      "06/27/2020 01:58:50 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 96000] Training Loss - [0.39182] Time Remaining - [0:20:50]\n",
      "06/27/2020 02:04:20 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 96500] Training Loss - [0.39099] Time Remaining - [0:15:18]\n",
      "06/27/2020 02:09:51 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 97000] Training Loss - [0.39020] Time Remaining - [0:09:46]\n",
      "06/27/2020 02:15:22 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 97500] Training Loss - [0.38940] Time Remaining - [0:04:14]\n",
      "06/27/2020 02:19:37 - mtdnn.modeling_mtdnn - INFO - Saving mt-dnn model to /tmp/tmpd9ok4aeo/checkpoint/model_3.pt\n",
      "06/27/2020 02:19:38 - mtdnn.modeling_mtdnn - INFO - model saved to /tmp/tmpd9ok4aeo/checkpoint/model_3.pt\n",
      "06/27/2020 02:19:38 - mtdnn.modeling_mtdnn - INFO - At epoch 4\n",
      "06/27/2020 02:19:38 - mtdnn.modeling_mtdnn - INFO - Amount of data to go over: 24471\n",
      "06/27/2020 02:20:57 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 98000] Training Loss - [0.38866] Time Remaining - [4:36:08]\n",
      "06/27/2020 02:26:30 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 98500] Training Loss - [0.38793] Time Remaining - [4:26:03]\n",
      "06/27/2020 02:32:03 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 99000] Training Loss - [0.38710] Time Remaining - [4:20:01]\n",
      "06/27/2020 02:37:37 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [ 99500] Training Loss - [0.38627] Time Remaining - [4:14:15]\n",
      "06/27/2020 02:43:11 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [100000] Training Loss - [0.38549] Time Remaining - [4:08:45]\n",
      "06/27/2020 02:48:44 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [100500] Training Loss - [0.38482] Time Remaining - [4:03:04]\n",
      "06/27/2020 02:54:18 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [101000] Training Loss - [0.38410] Time Remaining - [3:57:37]\n",
      "06/27/2020 02:59:53 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [101500] Training Loss - [0.38333] Time Remaining - [3:52:06]\n",
      "06/27/2020 03:05:27 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [102000] Training Loss - [0.38250] Time Remaining - [3:46:33]\n",
      "06/27/2020 03:11:01 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [102500] Training Loss - [0.38174] Time Remaining - [3:41:01]\n",
      "06/27/2020 03:16:34 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [103000] Training Loss - [0.38101] Time Remaining - [3:35:23]\n",
      "06/27/2020 03:22:06 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [103500] Training Loss - [0.38023] Time Remaining - [3:29:44]\n",
      "06/27/2020 03:27:40 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [104000] Training Loss - [0.37941] Time Remaining - [3:24:10]\n",
      "06/27/2020 03:33:13 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [104500] Training Loss - [0.37866] Time Remaining - [3:18:34]\n",
      "06/27/2020 03:38:45 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [105000] Training Loss - [0.37794] Time Remaining - [3:12:57]\n",
      "06/27/2020 03:44:19 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [105500] Training Loss - [0.37717] Time Remaining - [3:07:24]\n",
      "06/27/2020 03:49:49 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [106000] Training Loss - [0.37640] Time Remaining - [3:01:44]\n",
      "06/27/2020 03:55:22 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [106500] Training Loss - [0.37562] Time Remaining - [2:56:09]\n",
      "06/27/2020 04:00:54 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [107000] Training Loss - [0.37492] Time Remaining - [2:50:34]\n",
      "06/27/2020 04:06:28 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [107500] Training Loss - [0.37413] Time Remaining - [2:45:01]\n",
      "06/27/2020 04:12:01 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [108000] Training Loss - [0.37329] Time Remaining - [2:39:28]\n",
      "06/27/2020 04:17:35 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [108500] Training Loss - [0.37250] Time Remaining - [2:33:56]\n",
      "06/27/2020 04:23:07 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [109000] Training Loss - [0.37170] Time Remaining - [2:28:21]\n",
      "06/27/2020 04:28:42 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [109500] Training Loss - [0.37095] Time Remaining - [2:22:50]\n",
      "06/27/2020 04:34:15 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [110000] Training Loss - [0.37015] Time Remaining - [2:17:16]\n",
      "06/27/2020 04:39:48 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [110500] Training Loss - [0.36936] Time Remaining - [2:11:43]\n",
      "06/27/2020 04:45:18 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [111000] Training Loss - [0.36862] Time Remaining - [2:06:06]\n",
      "06/27/2020 04:50:52 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [111500] Training Loss - [0.36786] Time Remaining - [2:00:34]\n",
      "06/27/2020 04:56:24 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [112000] Training Loss - [0.36715] Time Remaining - [1:54:59]\n",
      "06/27/2020 05:01:59 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [112500] Training Loss - [0.36641] Time Remaining - [1:49:28]\n",
      "06/27/2020 05:07:32 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [113000] Training Loss - [0.36561] Time Remaining - [1:43:54]\n",
      "06/27/2020 05:13:08 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [113500] Training Loss - [0.36493] Time Remaining - [1:38:22]\n",
      "06/27/2020 05:18:39 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [114000] Training Loss - [0.36422] Time Remaining - [1:32:48]\n",
      "06/27/2020 05:24:13 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [114500] Training Loss - [0.36346] Time Remaining - [1:27:15]\n",
      "06/27/2020 05:29:45 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [115000] Training Loss - [0.36276] Time Remaining - [1:21:41]\n",
      "06/27/2020 05:35:18 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [115500] Training Loss - [0.36208] Time Remaining - [1:16:08]\n",
      "06/27/2020 05:40:50 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [116000] Training Loss - [0.36137] Time Remaining - [1:10:34]\n",
      "06/27/2020 05:46:21 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [116500] Training Loss - [0.36070] Time Remaining - [1:05:00]\n",
      "06/27/2020 05:51:53 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [117000] Training Loss - [0.36006] Time Remaining - [0:59:27]\n",
      "06/27/2020 05:57:28 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [117500] Training Loss - [0.35940] Time Remaining - [0:53:54]\n",
      "06/27/2020 06:03:00 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [118000] Training Loss - [0.35885] Time Remaining - [0:48:21]\n",
      "06/27/2020 06:08:32 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [118500] Training Loss - [0.35815] Time Remaining - [0:42:48]\n",
      "06/27/2020 06:14:07 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [119000] Training Loss - [0.35751] Time Remaining - [0:37:15]\n",
      "06/27/2020 06:19:40 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [119500] Training Loss - [0.35685] Time Remaining - [0:31:42]\n",
      "06/27/2020 06:25:16 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [120000] Training Loss - [0.35616] Time Remaining - [0:26:09]\n",
      "06/27/2020 06:30:47 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [120500] Training Loss - [0.35558] Time Remaining - [0:20:35]\n",
      "06/27/2020 06:36:21 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [121000] Training Loss - [0.35499] Time Remaining - [0:15:02]\n",
      "06/27/2020 06:41:53 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [121500] Training Loss - [0.35438] Time Remaining - [0:09:29]\n",
      "06/27/2020 06:47:26 - mtdnn.modeling_mtdnn - INFO - Task - [ 0] Updates - [122000] Training Loss - [0.35374] Time Remaining - [0:03:56]\n",
      "06/27/2020 06:51:23 - mtdnn.modeling_mtdnn - INFO - Saving mt-dnn model to /tmp/tmpd9ok4aeo/checkpoint/model_4.pt\n",
      "06/27/2020 06:51:24 - mtdnn.modeling_mtdnn - INFO - model saved to /tmp/tmpd9ok4aeo/checkpoint/model_4.pt\n"
     ]
    }
   ],
   "source": [
    "model.fit(epochs=NUM_EPOCHS)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluation and Prediction\n",
    "Perform inference using the last (best) checkpointed model. With 5 epochs, the last model would be `model_4.pt`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "06/27/2020 06:51:24 - mtdnn.modeling_mtdnn - INFO - Running predictions using: /tmp/tmpd9ok4aeo/checkpoint/model_4.pt\n",
      "06/27/2020 06:51:25 - mtdnn.modeling_mtdnn - INFO - predicting 0\n",
      "06/27/2020 06:51:45 - mtdnn.modeling_mtdnn - INFO - predicting 100\n",
      "06/27/2020 06:52:05 - mtdnn.modeling_mtdnn - INFO - predicting 200\n",
      "06/27/2020 06:52:27 - mtdnn.modeling_mtdnn - INFO - predicting 300\n",
      "06/27/2020 06:52:47 - mtdnn.modeling_mtdnn - INFO - predicting 400\n",
      "06/27/2020 06:53:07 - mtdnn.modeling_mtdnn - INFO - predicting 500\n",
      "06/27/2020 06:53:28 - mtdnn.modeling_mtdnn - INFO - predicting 600\n",
      "06/27/2020 06:53:48 - mtdnn.modeling_mtdnn - INFO - predicting 700\n",
      "06/27/2020 06:54:10 - mtdnn.modeling_mtdnn - INFO - predicting 800\n",
      "06/27/2020 06:54:30 - mtdnn.modeling_mtdnn - INFO - predicting 900\n",
      "06/27/2020 06:54:50 - mtdnn.modeling_mtdnn - INFO - predicting 1000\n",
      "06/27/2020 06:55:11 - mtdnn.modeling_mtdnn - INFO - predicting 1100\n",
      "06/27/2020 06:55:31 - mtdnn.modeling_mtdnn - INFO - predicting 1200\n",
      "06/27/2020 06:55:37 - mtdnn.modeling_mtdnn - INFO - Task mnli_mismatched -- epoch 0 -- Dev ACC: 84.422\n",
      "06/27/2020 06:55:37 - mtdnn.modeling_mtdnn - INFO - predicting 0\n",
      "06/27/2020 06:55:59 - mtdnn.modeling_mtdnn - INFO - predicting 100\n",
      "06/27/2020 06:56:19 - mtdnn.modeling_mtdnn - INFO - predicting 200\n",
      "06/27/2020 06:56:39 - mtdnn.modeling_mtdnn - INFO - predicting 300\n",
      "06/27/2020 06:57:00 - mtdnn.modeling_mtdnn - INFO - predicting 400\n",
      "06/27/2020 06:57:21 - mtdnn.modeling_mtdnn - INFO - predicting 500\n",
      "06/27/2020 06:57:42 - mtdnn.modeling_mtdnn - INFO - predicting 600\n",
      "06/27/2020 06:58:02 - mtdnn.modeling_mtdnn - INFO - predicting 700\n",
      "06/27/2020 06:58:22 - mtdnn.modeling_mtdnn - INFO - predicting 800\n",
      "06/27/2020 06:58:42 - mtdnn.modeling_mtdnn - INFO - predicting 900\n",
      "06/27/2020 06:59:04 - mtdnn.modeling_mtdnn - INFO - predicting 1000\n",
      "06/27/2020 06:59:24 - mtdnn.modeling_mtdnn - INFO - predicting 1100\n",
      "06/27/2020 06:59:45 - mtdnn.modeling_mtdnn - INFO - predicting 1200\n",
      "06/27/2020 06:59:50 - mtdnn.modeling_mtdnn - INFO - [new test scores saved.]\n",
      "06/27/2020 06:59:50 - mtdnn.modeling_mtdnn - INFO - predicting 0\n",
      "06/27/2020 07:00:10 - mtdnn.modeling_mtdnn - INFO - predicting 100\n",
      "06/27/2020 07:00:30 - mtdnn.modeling_mtdnn - INFO - predicting 200\n",
      "06/27/2020 07:00:52 - mtdnn.modeling_mtdnn - INFO - predicting 300\n",
      "06/27/2020 07:01:12 - mtdnn.modeling_mtdnn - INFO - predicting 400\n",
      "06/27/2020 07:01:32 - mtdnn.modeling_mtdnn - INFO - predicting 500\n",
      "06/27/2020 07:01:52 - mtdnn.modeling_mtdnn - INFO - predicting 600\n",
      "06/27/2020 07:02:14 - mtdnn.modeling_mtdnn - INFO - predicting 700\n",
      "06/27/2020 07:02:34 - mtdnn.modeling_mtdnn - INFO - predicting 800\n",
      "06/27/2020 07:02:55 - mtdnn.modeling_mtdnn - INFO - predicting 900\n",
      "06/27/2020 07:03:15 - mtdnn.modeling_mtdnn - INFO - predicting 1000\n",
      "06/27/2020 07:03:35 - mtdnn.modeling_mtdnn - INFO - predicting 1100\n",
      "06/27/2020 07:03:57 - mtdnn.modeling_mtdnn - INFO - predicting 1200\n",
      "06/27/2020 07:04:03 - mtdnn.modeling_mtdnn - INFO - Task mnli_matched -- epoch 0 -- Dev ACC: 84.144\n",
      "06/27/2020 07:04:03 - mtdnn.modeling_mtdnn - INFO - predicting 0\n",
      "06/27/2020 07:04:23 - mtdnn.modeling_mtdnn - INFO - predicting 100\n",
      "06/27/2020 07:04:43 - mtdnn.modeling_mtdnn - INFO - predicting 200\n",
      "06/27/2020 07:05:04 - mtdnn.modeling_mtdnn - INFO - predicting 300\n",
      "06/27/2020 07:05:24 - mtdnn.modeling_mtdnn - INFO - predicting 400\n",
      "06/27/2020 07:05:45 - mtdnn.modeling_mtdnn - INFO - predicting 500\n",
      "06/27/2020 07:06:06 - mtdnn.modeling_mtdnn - INFO - predicting 600\n",
      "06/27/2020 07:06:26 - mtdnn.modeling_mtdnn - INFO - predicting 700\n",
      "06/27/2020 07:06:46 - mtdnn.modeling_mtdnn - INFO - predicting 800\n",
      "06/27/2020 07:07:08 - mtdnn.modeling_mtdnn - INFO - predicting 900\n",
      "06/27/2020 07:07:28 - mtdnn.modeling_mtdnn - INFO - predicting 1000\n",
      "06/27/2020 07:07:48 - mtdnn.modeling_mtdnn - INFO - predicting 1100\n",
      "06/27/2020 07:08:09 - mtdnn.modeling_mtdnn - INFO - predicting 1200\n",
      "06/27/2020 07:08:15 - mtdnn.modeling_mtdnn - INFO - [new test scores saved.]\n"
     ]
    }
   ],
   "source": [
    "model.predict(trained_model_chckpt=f\"{OUTPUT_DIR}/model_4.pt\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Mnli Mismatched Dev</th>\n",
       "      <th>Mnli Matched Dev</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>ACCURACY</th>\n",
       "      <td>84.422</td>\n",
       "      <td>84.144</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Mnli Mismatched Dev Mnli Matched Dev\n",
       "ACCURACY              84.422           84.144"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results = {}\n",
    "dev_result_files = list(filter(lambda x: x.endswith('.json') and 'dev' in x, os.listdir(OUTPUT_DIR))) \n",
    "for d in dev_result_files: \n",
    "    name =  ' '.join(list(map(str.capitalize, d.split('_')))[:3]) \n",
    "    file_name = os.path.join(OUTPUT_DIR, d)\n",
    "    with open(file_name, 'r') as f: \n",
    "        res = json.load(f) \n",
    "        results.update(\n",
    "            {name: {\n",
    "                'ACCURACY': f\"{res['metrics']['ACC']:.3f}\"\n",
    "                }\n",
    "            }) \n",
    "df_results = pd.DataFrame(results)   \n",
    "df_results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean up temporary folders"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if os.path.exists(ROOT_DIR):\n",
    "    shutil.rmtree(ROOT_DIR, ignore_errors=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python (nlp_gpu)",
   "language": "python",
   "name": "nlp_gpu"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
